Information extraction and imprecise query answering from web documents

نویسندگان

  • Muhammad Abulaish
  • Lipika Dey
چکیده

Word based searches for relevant information from texts retrieve a huge collection and burden the user with information overload. Ontology based text information retrieval can perform concept-based search and extract only relevant portions of text containing concepts that are present in the query or those that are semantically linked to query concepts. While these systems have better precision of retrieval than general-purpose search engines, problems arise with those domains where ontological concepts cannot be unambiguously described using precise property descriptors. Besides, the ontological descriptors may not exactly match text descriptions or the user given descriptors in query. In such situations, uncertainty based reasoning principles can be applied to find approximate matches to user queries. In this paper we have presented a framework to enhance traditional ontological structures with fuzzy descriptors. The fuzzy ontology structure has been used to locate and extract both precise and imprecise descriptions of concepts from Web documents and then store them in a structured knowledge base. The design of the structured knowledge base, which in our case is a database, is also derived from the underlying fuzzy ontology representing the domain. User queries are processed in two stages. In the first stage, precise SQL queries are formulated and processed over the knowledge base to find a possible answer set. In the second stage, fuzzy reasoning is applied to compute the relevance of the answers in the answer set with respect to the query. We have provided experimental validation of the approach through knowledge-extraction and query processing executed over a diverse set of domains.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Resource Analysis for Question Answering

This paper attempts to analyze and bound the utility of various structured and unstructured resources in Question Answering, independent of a specific system or component. We quantify the degree to which gazetteers, web resources, encyclopedia, web documents and web-based query expansion can help Question Answering in general and specific question types in particular. Depending on which resourc...

متن کامل

Architecture of an Ontology-Based Domain-Specific Natural Language Question Answering System

Question answering (QA) system aims at retrieving precise information from a large collection of documents against a query. This paper describes the architecture of a Natural Language Question Answering (NLQA) system for a specific domain based on the ontological information, a step towards semantic web question answering. The proposed architecture defines four basic modules suitable for enhanc...

متن کامل

Opinion-Based Imprecise Query Answering

There is an exponential growth in user-generated contents in the form of customer reviews on the Web. But, most of the contents are stored in either unstructured or semi-structured format due to which distillation of knowledge from this huge repository is a challenging task. In addition, on analysis we found that most of the users use fuzzy terms instead of crisp terms to express opinions on pr...

متن کامل

Ad-Hoc Queries over Document Collections - A Case Study

We discuss the novel problem of supporting analytical business intelligence queries over web-based textual content, e.g., BI-style reports based on 100.000’s of documents from an ad-hoc web search result. Neither conventional search engines nor conventional Business Intelligence and ETL tools address this problem, which lies at the intersection of their capabilities. “Google Squared” or our sys...

متن کامل

Quer ies over Document Collections - a Case Study ( incomplete workshop discussion draft )

We discuss the novel problem of supporting analytical business intelligence queries over web-based textual content, e.g., BI-style reports based on 100.000’s of documents from an ad-hoc web search result. Neither conventional search engines nor conventional Business Intelligence and ETL tools address this problem, which lies at the intersection of their capabilities. “Google Squared” or our sys...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Web Intelligence and Agent Systems

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2006